12. DataFrame apply()
DataFrame apply()
Question:
Start Quiz:
import pandas as pd
grades_df = pd.DataFrame(
data={'exam1': [43, 81, 78, 75, 89, 70, 91, 65, 98, 87],
'exam2': [24, 63, 56, 56, 67, 51, 79, 46, 72, 60]},
index=['Andre', 'Barry', 'Chris', 'Dan', 'Emilio',
'Fred', 'Greta', 'Humbert', 'Ivan', 'James']
)
# Change False to True for this block of code to see what it does
# DataFrame apply()
if False:
def convert_grades_curve(exam_grades):
# Pandas has a bult-in function that will perform this calculation
# This will give the bottom 0% to 10% of students the grade 'F',
# 10% to 20% the grade 'D', and so on. You can read more about
# the qcut() function here:
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html
return pd.qcut(exam_grades,
[0, 0.1, 0.2, 0.5, 0.8, 1],
labels=['F', 'D', 'C', 'B', 'A'])
# qcut() operates on a list, array, or Series. This is the
# result of running the function on a single column of the
# DataFrame.
print convert_grades_curve(grades_df['exam1'])
# qcut() does not work on DataFrames, but we can use apply()
# to call the function on each column separately
print grades_df.apply(convert_grades_curve)
def standardize(df):
'''
Fill in this function to standardize each column of the given
DataFrame. To standardize a variable, convert each value to the
number of standard deviations it is above or below the mean.
'''
return None
Solution:
INSTRUCTOR NOTE:
Note: In order to get the proper computations, we should actually be setting the value of the "ddof" parameter to 0 in the .std() function.
Note that the type of standard deviation calculated by default is different between numpy's .std() and pandas' .std() functions. By default, numpy calculates a population standard deviation, with "ddof = 0". On the other hand, pandas calculates a sample standard deviation, with "ddof = 1". If we know all of the scores, then we have a population - so to standardize using pandas, we need to set "ddof = 0".